Personnel
Overall Objectives
Research Program
Application Domains
Highlights of the Year
New Software and Platforms
New Results
Bilateral Contracts and Grants with Industry
Partnerships and Cooperations
Dissemination
Bibliography
XML PDF e-pub
PDF e-Pub


Section: New Results

Action Detection in Untrimmed Videos

Participants : Abhishek Goel, Michal Koperski, François Brémond.

Problem Statement

The problem addressed in this work is Online Action Detection in Untrimmed Videos. The task of action detection can be broken down into two major modules, namely Action Recognition module and Temporal Localization module. Action Recognition module is responsible for assigning an action label to a trimmed video clip that is having only one action from start to the end of the clip. Temporal localization module on the other hand is responsible for deciding upon the start and end of the action present in an untrimmed video. The work has been done on the Smarthomes Dataset [22].

Action Detection Framework

Challenges

The major challenges when working with untrimmed videos in an online fashion are to identify the intervals where there are No Action of interest present and to identify the transition from the No Action interval to an interval containing an action of interest. In order to address these two problems, two new methods were proposed.

Proposed Methods

Distance Based Sliding Window

The first method, named "Distance Based Sliding Window" defined an actioness criterion based on the distance of a Fisher Vector from the hyperplane of a class of a trained classifier to address the problem of identifying the No Action intervals. Figure 20 gives an overview of the proposed approach.

Figure 20. Approach Distance Based Sliding Window. First, a candidate, a short clip, is selected using the sliding window. This candidate is sent as an input to the action recognition system which returns the predicted class along with the distance of the fisher vector from the hyperplane of this class. Finally, if this distance is greater than a threshold T, the predicted label is that class otherwise it is No Action interval.
IMG/single_sliding_window_non_negative_distance.png

Past and Future Windows

The second method, named "Past and Future Windows" addressed the second issue with a sliding window architecture which makes use of some of the future frames in order to get an action label for the current frame. The task is to perform Online Action detection in which ideally we have information only about the frames that have been seen till now and prediction for the current frame has to be done on the basis of this information. The term "future" refers to the frames which come after the frame in consideration. Since now the label is getting predicted for a frame after seeing some more frames after it, a delay is introduced in the prediction of the label. This delay is equivalent to W frames, where W is the window size. Figure 21 gives an overview of the proposed method.

Figure 21. Past and Future windows approach. For the current frame, two temporal windows of size W frames are considered. The first window contains past frames and the second one future frames relative to the frame for which the label has to be predicted. Both the windows predict an action label with a probability with the help of an Action Recognition module. The final class label is corresponding to the class which returns the highest probability.
IMG/future_window.png